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Abstract: 

We propose a rigorous and effective way to compare experimental and theoretical histograms, 
incorporating the different sources of statistical and systematic uncertainties. This is a useful tool 
to extract as much information as possible from the comparison between experimental data with 
theoretical simulations, optimizing the chances of identifying New Physics at the LHC. We illustrate 
this by showing how a search in the CMSSM parameter space, using Bayesian techniques, can 
effectively find the correct values of the CMSSM parameters by comparing histograms of events with 
multijets + missing transverse momentum displayed in the effective-mass variable. The procedure 
is in fact very efficient to identify the true supersymmetric model, in the case supersymmetry is 
really there and accessible to the LHC. 
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1. Introduction 

The LHC is already probing new physics beyond the reach of past experiments. At any 
stage of this enterprise, i.e. with the available data at any time, there are two main 
questions to address: 1) Is there any signal of New Physics (NP)? and 2) In the positive 
case, which NP is it? In order to optimize the answer to these questions there is an intense 
activity to explore assorted strategies for the search of NP. The task is challenging, due in 
part to the fact that LHC data, though very rich, are not as clean as those from an e~^e~ 
collider. Besides, the theoretical calculations are also subject to great uncertainties and 
rely to some extent on Monte Carlo simulations. 
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Most of the LHC data can be organized in form of histograms with number of events of 
a certain kind (e.g. those presenting multijets + missing transverse momentum) displayed 
in different variables |2|, |3|: M^g, p™'^^, ut, etc. In many cases the comparison with the 
simulations is done just by comparing the total number of events after performing different 
cuts in the variables involved. In this way, both ATLAS and CMS have already posed 
meaningful bounds [Q, ^ on NP scenarios, in particular on the simplest supersymmetric 
model, the so-called Constrained Minimal Supersymmetric Standard Model (CMSSM) Q. 
More precisely, the most powerful bounds on the CMSSM have been obtained by consider- 
ing events with several jets -|- missing transverse momentum. Somehow, the study of the 
total number events, choosing different cuts on the sets of data, amounts to partially com- 
pare the shapes of the experimental and the theoretical (simulated) histograms; although 
in a way which is not optimal. 

As mentioned above, even if we are quite sure to have a signal of NP, we face the 
problem of identifying the model producing such signal. Of course the variety of scenarios 
of NP is enormous, which makes the job very complex. Even playing in the framework of 
a given scenario, such as the CMSSM, the sole study of the number of events of a certain 
kind is not enough to determine the parameters of the model, due to the existence of 
big degeneracies in such determination. Again, this situation can be improved by probing 
different cuts in the sets of data. But, once more, this is not an optimized way of comparing 
theory and experiment, since the richness of the data is not completely exploited. 

The goal of this paper is to propose an effective and rigorous way to compare ex- 
perimental and theoretical histograms, incorporating the different sources of uncertainty 
involved in the task. In our opinion, in an experiment with the characteristics of the 
LHC this is a useful tool to extract as much information as possible from the comparison 
between experimental data with theoretical simulations. We illustrate this usefulness by 
showing how a search in the CMSSM parameter space, using Bayesian techniques, can 
effectively find the correct values of the CMSSM parameters by comparing histograms of 
events with multijets -|- missing transverse momentum displayed in the M^q variable. This 
procedure could be very efficient to identify the true supersymmetric model, in the case 
supersymmetry is really there and accessible to the LHC. 

In section 2 we establish the notation and the statistical basis for the rigorous com- 
parison between the experimental and the theoretical histograms. Section 3 is devoted to 
the incorporation of extra sources of uncertainty, in particular systematic ones. At the end 
of this section we give our final formula for the complete likelihood of a theoretical model 
by histogram comparison. In section 4 we illustrate the proposed technique by showing 
how a search in the constrained-MSSM parameter space, using Bayesian techniques, can 
effectively find the correct values of the MSSM parameters by comparing histograms of 
events with multijets + missing transverse momentum displayed in the effective-mass vari- 
able. But, of course, the technique can be applied to any scenario of new physics. Our 
conclusions are summarized in section 5. Finally, in the Appendix we show how our final 
formula for histogram-comparison is (slightly) modified when the effective luminosity of 
the theoretical simulation is not the same as the experimental one. 
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2. Comparison of histograms. Statistical uncertainties 



2.1 Basic Ingredients and Notation 

Suppose we have experimental data, e.g. multijet + missing transverse momentum events 
at LHC, organized in an histogram upon some variable M, e.g. the effective mass of the 
events, as defined in ref. Q. Let us call K the number of bins of the histogram. Each bin 
corresponds to a (central) value of the effective mass, Mj. We will denote the bin contents 
(number of events for each Mi) by Vi. The total number of events is w = Vi. 

Leaving apart for the moment all sources of systematic uncertainties, the probability 
that the experiment produces the actual data, Vi, is given by a Poisson distribution 



K 



nv^) = \{ ^e--, (2.1) 



where are the expected values (or "means") of the distribution. The values of are in 
principle calculable (at some degree of precision) provided we knew the theory responsible 
for them, e.g. the Standard Model. But we are precisely trying to uncover unknown NP, 
therefore Vi are unknown. 

On the other hand, working within a scenario of NP defined by some parameters, 
da (for example the parameters of the CMSSM), we can in principle calculate the means 
under, supposedly, the same conditions of energy and luminosity as the experiment. We 
will denote these theoretical means. Of course, depend on the point in the parameter 
space, i.e. the precise model under consideration. If the model is the true one, then Vi = Hi. 
This is the so-called "null- hypothesis" . The likelihood of a point of the parameter space is 
the corresponding probability of producing the observed data, Vi, under the null- hypothesis, 
i.e. 

K Vi 

Viv^) = T\ ^e-^\ (2.2) 

The likelihood is a crucial quantity to compare the viability of the different regions of the 
parameter space, both in frequentist and Bayesian analyses (see, e.g. 0). In particular, 
in Bayesian analyses one is interested in determining the probability density of a point 
of the parameter space, 6a, given an experimental set of data (in our case, Vi). This is 
the so-called posterior probability density function (pdf), p(0a|data), which is given by the 
fundamental Bayesian relation 

p(ejdata) = p(data|0a) piOa) ^^^^ • (2-3) 

Here p{AaXa\6a) is the above-mentioned likelihood, i.e. the probability of obtaining the 
observed data if the model defined by the 9a parameters is the true one; while p{Oa) is 
the prior, i.e. the "theoretical" probability density that we assign a priori to the point 
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in the parameter space; and p(data) is a normalization factor that ensures that the total 
probability is one. 

In order to compute the likelihood ( |2.2D we need the theoretical means, /Xj. However, in 
practice one does not have at disposal a complete evaluation of /Xj, but rather a simulation 
of the process using diverse computation codes. The results of the simulation can also be 
organized in an histogram with K bins, associated with the same values of the effective 
mass, Mj. The bin contents of the simulation are denoted by Uj, with total number of 
events u = Y2i ^i- Of course, the values of Ui obey also a Poisson statistics 

r{ui) = l[ ^e-'^^ (2.4) 
1=1 

Here we have again left aside for the moment all sources of systematic uncertainties asso- 
ciated with the theoretical simulation. 



2.2 Computation of the likelihood 

As mentioned, usually the codes provide values for Uj, but not for /i^. If we had enough 
computation time we could obtain a good evaluation of the theoretical means, /Xj, since, 
increasing the statistics, the bin contents would approach the mean values with decreasing 
relative uncertainty. This would be practical if we knew from the beginning which specific 
model we want to test, but this procedure is not efficient if we want to scan the parameter 
space, testing thousands or millions of models (points in that space). So, identifying Ui 
with /Xj is not justified unless Ui is large. The relation between them is given by eq.(p.4|). 
Since we are not sure about the values of ^j, we cannot directly calculate the likelihood 
V{vi) from eq.( |2.2| ). The best we can do is calculate V{vi\ui)^ i.e. the probability of 
getting the experimental data, Vi, under the assumption that the model is the true one 
(null- hypothesis), given that the simulation has produced Ui, 

K 

P{vi\ui) = / n dniV{vi\ni)V{ni\ui). (2.5) 



Here V{vi\fii) is given by the Poisson distribution (|2.2D and V{fii\ui) denotes the probability 
that the theoretical means are /Uj, given that the simulation has produced the Ui-histogram. 
V{^i\ui) is not known, we must infer it using the Bayes theorem, 

where V{ui\fJ:i) is the probability for each individual bin, given by the Poisson distribution 
( |2.4| ), and V{fJ-i) is the prior for /Xj. Since V{ui\^i) is peaked around Ui = /ij, the dependence 
on the prior, V{fii), is small, but nevertheless it is there. The simplest procedure here is to 
take a flat prior for V{fii). Then the V{fJ:i) cancels in the numerator and the denominator 
of eq.(p.6|) (the latter becomes simply 1), and we can identify 

V{^l^\u,i) = V(u^\^^i). (2.7) 
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Now eq. (|2.5D reads 

P{vi\u^) = [fl df.ii^e-'^'i^e-^^ ^Yl^JM + v^ ^-i^u.-.. _ (2.8) 

J Vi\ Ui\ ^\ Uil Vi\ 

1=1 1=1 

This formula represents our best estimate the hkehhood, although is only valid when the 
non-statistical sources of uncertainty, both in the experimental and in the theoretical side, 
are ignored (they are incorporated in the next section). Note that expression (^]^) avoids 
the problem of the empty bins in the theoretical simulation. In other words, if one simply 
identified = Ui, then the presence of an empty bin (ui = 0) would make the whole 
likelihood - eq.( |2.2| )- vanishing. Therefore the V{fii\ui) piece in the calculation of the 
likelihood, eq.(p.5[), is important, at least for bins with low statistics. 



2.3 Separation of normalization and shape tests 

Suppose for a moment we could calculate all fii with great accuracy and that we keep ignor- 
ing other sources of uncertainties different from the statistical ones. Then, the likelihood 
is simply given by the Poisson distribution V{vi), as given by eq.(^.2|). 

Now, it is interesting that that expression can be separated in a test for the global 
normalization (the total number of events) and a test for the shape. Namely 

P{vi) = W ^e-f"' = P(norm) x P(shape), (2.9) 
i=i 



where 



■p(norm) = —e~^ with u =/ Ui, 

7) z — / 



p(shape) = n ^^^^ = ^ n (^'^-y'^ (^.lo 



with 



i=i ' r=i ^' 



K 



V=—\\—, = const. (2.11) 



1=1 



Notice that both 'P(norm), P(shape) are proportional to terms given by Poisson dis- 
tributions. In particular, P(shape) is proportional to a Poisson distribution, where the 
means /ij are re-normalized so that they would fit perfectly the total number of events: 

K 

7'(shape) = V\[{iii-r^ 
i=\ ^ 

K K (.,.v\v. 



i=l i=l 
K 



^fl^^^i. (2.12) 

i=\ 
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This expression is really independent of the global normalization, i.e. if we make Hi — ?• af^i, 
then 7^ (shape) remains the same. This also tells us that if we fix the shape of a simulated 
histogram and allow to change its global normalization (i.e. we allow /ij — t- a/ij), the total 
probability (2.9) is always maximal when the global mean, //, coincides with the total 
number of events, v. 

The interesting thing about separating normalization and shape tests is that one can 
treat the extra sources of systematic uncertainty for both in a separate way, as will become 
clear in the next section. For instance, one may consider that the amount of systematic 
uncertainty in the global normalization is larger than in the shape, and hence it is useful 
to separate the two tests. 



3. Incorporating other sources of uncertainty. Systematic errors 
3.1 General strategy 

There are several sources of uncertainty in the comparison of the experimental data with 
the theoretical predictions. First, there is the statistical uncertainty, associated to the 
Poisson distributions, which has been the subject of the previous section. Besides there are 
additional sources of systematic uncertainty, both in the experimental side (the resolution 
and the scale of jets and missing transverse momentum, b-tagging, pile-up, etc.) and in 
the theoretical one (K-factors, parton distribution functions, etc.). However, for practical 
purposes, we can treat the experimental data as if they were free from systematic errors 
and "absorb" all the experimental systematic uncertainty in the theoretical side. 

We will call ^f^ the means that, in the simulation process, have produced the theo- 
retical (uj) histogram. Now, due to the systematic uncertainty, we cannot identify them 
directly with the "true means", //j, which are the real ones associated with the model un- 
der consideration, and thus the ones that, supposedly, have "produced" the experimental 
histogram (vi) under the null- hypothesis. The relation between them can be expressed as 

fXi{M) = F{Mi) , (3.1) 

where F{M) is some "transfer function" on the effective mass (M) that encodes all (exper- 
imental and theoretical) systematic uncertainties. This function can depend on a number 
of unknown parameters, though we know it cannot be completely arbitrary (below we give 
an ansatz for F{M)). 

Now, in analogy with eq.( |2.5| ), the best estimate for the likelihood is 

P{vi\ui) = j DF j Dfii'' V{vi\^ii) V{iif\ui) P(/Xi|;uf ), (3.2) 

where V{ijti\fif^) = 'P{F) is still to be guessed, and the integration measure DF is written 
in a symbolic form. The first two factors in the integrand are statistical probabilities, as 
in (^.5|). The third factor contains the systematic uncertainty (if we decided to ignore it, 
then we would simply take 'P(/^j|/^*^) = 5{F — 1)). 
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We can write explicit expressions for the three factors in eq.(|3.2D. The first factor ,7^ 



is given by the Poisson distribution (2.2). Regarding the second factor, recall that (taking 
a fiat prior for jj}^) we can identify 

K / th\u- 

Finally, we have to make ansatz for the F function and its probability, P(/ij|/x-^) = V{F). 
Since it is convenient to separate the uncertainties associated to the global normalization 



and to the shape, we express eq. (|3.lD as 

lii = F{Mi) /xf = f gi . (3.4) 

Here / and gi carry the uncertainty in the global normalization and in the shape, respec- 
tively. With this definition, gi obey the relation 

i i 

i.e. the gi parametrize systematic errors that modify the shape of the histogram without 
changing the total number of events. The situation f = g^ = \ corresponds to the absence 
of systematic errors, but we have to assign a non- vanishing probability to the possibility 
that /, gi depart from that ideal situation. Thus we write 

V{,l,\^^f)^V{f,g^) = V{f)V{g). (3.6) 

For the moment we do not write a concrete ansatz for V{f), V{g) (this is postponed to the 
next subsection). So, the likelihood (3.2) is given by 



I D^j't I DfDg (j[ t^e-A (f{ ^e"'^*'^ j nf)V{g), (3.7) 



V{vi\ui 



In this expression Df, Dg are symbolic ways to express integration over all the possibilities 
for /, gi. 

3.2 Ansatze for the transfer functions 

In eq.(^.4|) we have written the "transfer" function, F, that encodes the systematic uncer- 
tainty, as 

F{Mi) = f gi, (3.8) 

but so far we have not established on which parameters the F-function -and thus the 
quantities /, gi- depend. A simple and handy choice for practical purposes is to take the 
very values of {f,gi} as those independent parameters. Alternatively, since systematic 
errors must depend on M in a smooth way, we could parametrize F{M) as a smooth 
function, e.g. F ^ f X^a^a-^aj where Pa are ~ Legendre Polynomials and the summation 
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contains just a few terms. Then, the F function would be defined by the coefficients 
(together with the global normalization factor, /). This would be sensible, but it leads to 
very cumbersome expressions, difficult to handle. On the other hand, since in practice Vi 
and Ui are both quite smooth (apart from statistical noise), only sets of values of F[Mi) that 
vary smoothly with M can lead to a simultaneous fit of both histograms. In other words, 
chaotic values of F{Mi) (or, equivalently, gi) will be strongly penalised by the V{vi\^i) 
piece (first factor in eq.(|3.7[)). So, even if those eccentric choices for gi are not specially 
penalised by V{g), they are by other factors in the likelihood and become irrelevant. In 
consequence, choosing {f,gi} as independent parameters is a reasonable option 

Concerning the integration measures, we could simply take Df = df, Dg = W^dgi. 
However, since {f,gi] are defined as multiplicative factors in eq. (|3.4D , it seems much more 
sensible to use their magnitudes as the actual unknowns. This is equivalent to choose 
{In/, ln(7j} as the independent parameters. Then, 

1 ^ 1 

Df = -df, Dg = ll-dgi. (3.9) 

/ 9i 

Of course, since {f,gi} are never far from 1, it does not make a big difference to use {f,gi} 
or {In f, In gi}, but it can be checked that the second option leads to a more stable and 
satisfactory test. Note that, in principle, the gi variables are subject to condition ( |3.5| ), so 
there are in fact K — 1 independent gi variables. However, for the moment we have ignored 
such complication in writing (p.Sj). 

Finally, concerning the probabilities V{f), V{g), we can take them as gaussians cen- 
tered around f = gi = 1. The argument of these gaussians must be essentially the 
"squared-distance" of {f,gi} to their central values, i.e. V{f) ~ exp{— i(/ — 1)^} and 
V{g) ~ exp{-i / dM{g{M) - Vf} ~ exp{-i Y^iidi - ^f}- A nice fact here is that V{g) 
appears naturally factorized as Y\i'P{gi), which is very convenient for analytical manipu- 
lations. 

A suitable (and equivalent at first order), way to express these ansatze is by using the 
logarithmic variables, {In/, In^j}: 

P(/) = -=— e ^V-/; , (3.10) 



1 _ly- 

V{g) « -^e A, ; ^ (3.11) 

a 

where the widths Aj, measure our degree of ignorance about the magnitude of f, gi. 
Note that the use of logarithmic variables allows to maintain the whole range of integration 
of the gaussians, [—00,00], without artificial cuts to keep {f,gi} positive. 

In any case, we will go as far as possible in the analysis without specifying the precise 
ansatze for V{f), 'P{g). 
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3.3 Separation of normalization and shape tests 



Coming back to our expression (3/7) for the hkelihood, we note that the first factor of ( |3.7| ) 
may be decomposed, as in eqs. (|2.9D -( |2.10| ), into a factor for the global-normahzation and 
another for the shape: 




K 



Vflifi^-r, (3.12) 



where V is given in eq.( 2.11| ). Since the total number of events is normally large the 



global-normalization factor can be approximated by a Dirac delta, 

^e-^ ^ 5(/. -v) = 5{ff,"^ -v) = ^6{f - v/f,'^). (3.13) 
Analogously, the second factor of (|3.7| ) can be written as 

^ ( th\u ^ 

n ^-^--'^^ - - n) X ^ n (3-14) 

with 



1=1 ' i=i ^ 



! ^ 1 

U = ^uJ\-^= ^o^^t- (3-15) 

i=l * 



We can use the presence of these deltas to extract pieces of the integrand of eq.( |3.7| ) outside 
the sign of integration. Hence, 



V{v,\u.i) oc V{f = -) 
u 



^ i' 1) „ ,,th\Vi 



X I D,f^Dg[jl^-^^^ (3.16) 
Note that we have made explicitely the integration in J Df = J {l/f)df, but not in J dfj,^^. 



However the implicit presence of the 6{^^^ — u) in the integrand, as expressed in eg. ( 3. 14 ), 
has allowed us to replace /i*'^ — )• u in a consistent way. 

Assuming in the previous expression that Dg and V{g) are factorizable as products 
of K factors, like in eqs. (^, ( PH) , makes much easier the integration in practice. As 



mentioned, gi are subject to condition (|3.5| ), so strictly speaking we only have X — 1 in- 
dependent gi variables and this factorization is not complete. In spite of this, assuming 
a complete factorization is a sensible and good approximation. The reason is the follow- 
ing. In eq.(3.16) the Poisson distribution in the first factor of the integrand only departs 



appreciably from zero when YldifA^ ~ ti ~ fi''^. This can be checked by doing again a de- 
composition of such distribution as in ( |2.9| , p. 10 ) and noting that the global- normalization 



piece of the decomposition is essentially a Dirac delta, (5(^ ^9il^t ~ ''^)- even assuming 
that Dg and Vig) are factorizable, and thus integrating over sets of {gi,g2, ■■■gK} which do 
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not respect (|3.5D , the Poisson distribution in the first factor of the integrand causes that 
only those sets that obey condition ( p.5| ) will contribute appreciably to the integral. Alter- 
natively one could understand this procedure considering that in the expression ( p.4[ ), the 
Qi variables that encode Mj— dependent systematic errors can also distort the total number 
of events. This is in fact a quite realistic situation. Then, the relation ( |3.5D is not to be 
imposed and Dg and Vig) become factorizable as products of K factors (strictly speaking). 
The trouble is that the previous separation between normalization and shape cannot be 
done exactly. But, if the gi only amount to slight distorsions of the total normalization (in 
other words, V{g) penalizes much more severely the variation in the normalization than 
V{f)) the separation ( ^.14 ) is a good approximation and ([3.161) is valid. 

Now, taking profit of the factorization of Dg and V{g) we can make explicitely the 
integration in the variables, with no need of specifying the ansatze for V{f), 'P{g): 

V{vi\ui) (X V{f = -) 
u 



K 

X 



n ( / (-».)"' (i + -f.) ■ (3.IV) 

V Ui\Vi\ J gi\u J \ u J J 

In this expression, the factor of the first line, V{f = ^), carries the test for the global 
normalization: it is only sensitive to the mismatch between the experimental total number 
of events, v, and the theoretical one, u. The remaining factor (second line) corresponds to 
the test of the shape. It is interesting to check that indeed, for given u, v, this expression 
has a maximum at Ui = {u/v)vi. 

Eq.( |3.17] ) represents our final expression to evaluate the likelihood of a simulated his- 
togram, Ui, confronted to the experimental one, Vi. (A modified version is given in eq.(|6.4|) 
of Appendix A to incorporate the fact that the luminosity of the simulated histogram may 
be different from that of the experimental one.) This expression amounts to realize K 
integrals, which can be done numerically at low cost in computing time, even if one needs 
to probe thousands or millions of histograms, corresponding to points in the parameter 
space of a theoretical scenario. All this is illustrated in the next section. 



4. Application to the CMSSM 
4.1 Set up 

In this section we apply the previous histogram-comparison techniques to the study of the 
Minimal Supersymmetric Standard Model (MSSM) |jl^. More precisely, we will consider 
the somehow standard framework, often called CMSSM or MSUGRA, in which the soft 
parameters are assumed universal at a high scale (Mx), where the supersymmetry (SUSY) 
breaking is transmitted to the observable sector; as happens e.g. in the gravity-mediated 
SUSY breaking scenario. Hence, our parameter-space is defined by the following parame- 
ters: 

{9i} = {m,M,/2,A,B,fi,s} . (4.1) 
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Here m, Mx/2 and A are the universal scalar mass, gaugino mass and trilinear scalar 
coupling; B is the bilinear scalar coupling; ^ is the usual Higgs mass term in the super- 
potential; and s stands for the SM-like parameters of the MSSM. The latter include the 
SU{3) X SU (2) X U (l)y gauge couplings, g3,g, g' , and the Yukawa couplings, which in turn 
determine the fermion masses and mixing angles. All the initial parameters are defined at 
the Mx scale. 

The goal is to scan the CMSSM parameter space, determining the most probable region 
in it, given the available (present or future) experimental (mainly LHC) data. To show the 
power of the histogram-comparison technique, we will simulate LHC data assuming that 
nature lives in a standard benchmark SUSY model. This simulation will be considered 
as our (mock) experimental data. Then we will scan the CMSSM parameter space using 
Bayesian techniques to find out the most probable region of parameters, showing to which 
extent the histogram-comparison between the mock data and the theoretical prediction is 
capable to determine the "true" model. 

As mentioned in sect. |2|, in a Bayesian analysis the most important quantity is the 
posterior probability density function (pdf) in the parameter space, which is given by the 
fundamental Bayes' relation 

p{s,m, Ml /2, A, B, fi\data) oc p{deita\s,m, M1/2, A, B, fj,) p(s,m, M1/2, A, B, fi), (4.2) 

where the first factor in the right hand side is the likelihood (probability of measuring the 
observed data assuming that the point in the parameter space is the true model) and the 
second one is the prior (probability assigned to that point before knowing the experimental 
data). Next, we discuss the precise form of the pdf ( [4.2| ) for the problem at hand. 

First of all, it should be noticed that very often in statistical problems not all the 
parameters that define the system are of the same interest. The usual technique to eliminate 
the less interesting ones from the problem is simply marginalizing them, i.e. integrating 
the pdf (^]^) in those variables (for a review see ref. This is the standard procedure to 
deal with the nuisance parameters {s}. Besides, for the purposes of scanning the CMSSM 
parameter space it is convenient to trade some of the initial parameters (4.1) by others 



with more direct phenomenological significance. We follow here the approach expounded 
in detail in refs. As usual, the value of /i can be traded by the value of Mz using 

the minimization conditions of the Higgs potential. The Yukawa couplings can be traded 
by the physical fermion masses; in particular the top Yukawa coupling, yt, can be traded 
by the top mass, m^. Finally, it is highly advantageous to trade the initial i?— parameter 
by the derived tan/3 = {Hu) / {H^) parameter, where Hu^Hd are the Higgs doublets of the 
MSSM^ 

Consequently, to write the pdf in the new variables one should compute the Jacobian, 
J, of the transformation 

Ui^yuB] ^ {Mz,mt, tan /3}. (4.3) 



^This change of variables still leaves the sign of ^ undetermined. For simplicity we have assumed a 
positive ^1 in the rest of the paper. 
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On the other hand, the /Lt-parameter (now traded by Mz) can be easily marginahzed, 
together with the nuisance parameters, taking profit of the high precision of our knowledge 
of M^^. Consequently, the final expression for the posterior ( [4.2[ ) in the new variables is 

p(m, Ml/2) ^) tan /3|data) ^ </|^i=^z p(data|m, M1/2, tan/3) 

X p{m,M,A,B,fi = ^iz), (4.4) 



where J is the Jacobian of the transformation ( [4 .31) and fj,z is the value of fj, that reproduces 
for the given values of {m, M, A, tan Let us discuss in order the three factors of 
the r.h.s. of 



The Jacobian factor, J, has to be evaluated using the (radiative) electroweak breaking 
conditions of the CMSSM. For the numerical analysis we have computed J using the 
Sof tSusy code ||l^ which implements the full one-loop contributions and leading two-loop 
terms to the tadpoles for the those conditions, with parameters running at two-loops. 
This essentially corresponds to the next-to-leading log approximation. A quite accurate 



analytical expression of J, corresponding to the leading log approximation, reads ||10 

y tan /S^ - 1 B\ow 



E 



yiow tan /3(1 + tan^ /3) /i^ 



(4.5) 



Here y denotes the top Yukawa coupling and the "low" subscript indicates that the quantity 
is evaluated at low scale (say Qiow = typical supersymmetric mass). and E are RG 
quantities, involved in the one-loop running of /i and y: 

yE{Q\ow) 

MIOW = R^i Mow - -I , r. rpf^ 7: (4-6) 

where Q is the renormalization scale and F = ElnQ. and E are definite functions 

that depend just on the top Yukawa coupling and the gauge couplings, respectively, 

The important point about the Jacobian is that it does not represent any subjective 
prior on the parameters. Such subjectivity is still contained in the prior factor that stands 



in the second line of eq.(4^). The Jacobian is simply a consequence of scanning the MSSM 
parameter space in some variables, which are not the initial ones, but derived quantities. 
Another important point is that J automatically incorporates a penalization of the regions 
that require fine-tuning in order to reproduce the correct electroweak scale (typically regions 
of large soft parameters), as well as a penalization of large tan/3, reflecting also the fine- 
tuning needed to implement such possibility. A more detailed discussion of these issues 
can be found in refs. pA| . 

Let us discuss now the second factor of the r.h.s. of eq.( |4.4D , i.e. the likelihood. This 
consists of a product of likelihoods corresponding to the experimental observables used 
in the analysis. For the present one, we just consider (besides the value of M^"^) the 
experimental bounds on the masses of supersymmetric particles and the lightest Higgs 



boson (see ref.|15] for details), and the mock LHC data of multijet events plus missing 
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transverse momentum, which is the main focus of this section^. More precisely, for the 
sake of the simulated LHC data we work under the hypothesis that nature lies in the so- 
called SU9 benchmark point, defined in ref. p. This is specified by the following values 
of the CMSSM parameters: 

m = 300 GeV, M1/2 = 425 GeV, A = 20 GeV, tan/3 = 20, > 0. (4.7) 

The corresponding values of the squark mass (first two generations) and the gluino mass 
are niq = 920 GeV, Mg = 994 GeV. This point of the CMSSM parameter space is on 
the verge of being excluded by the last analyses by ATLAS and CMS [0, 0. Of course, 



assuming the SU9 point is just an example to show the histogram-comparison technique 
at work, combined with Bayesian analysis. The LHC simulation has been performed using 
Pythia version 6.419 with events generated at Ecm = 14 TeV, and selecting those 
satisfying the following cuts^: 



• Three or more jets with px > 30 GeV and \ri\ < 3.0. The hardest with pT > 180 GeV 
and \r]\ < 1.7, the second with > 110 GeV . 

• p™^^ > 200 GeV . 

• A4>i > 0.3, A(/)2 > 0.3, A(/>3 > 0.3 . 

• y/A(j)'i + {tt- A(/)i)2 > 0.5, y/A(l)'i + {tt- A(j)2V > 0-5 and A(j)2 > 7r/9 . 
, Ht = Ei=2^'T +^'r''' > 500 GeV . 



where A(/)j = A(/>(jetj — p^^^^). Concerning the luminosity we have considered 10^ super- 
symmetric events, upon which we impose the previous cuts. Since the total cross section 
for SUSY production in the SU9 model is 2.4 pb, this corresponds to a luminosity of about 
4.2 fb-^ The histo gram of number of events as a function of the effective mass, Mgff, is 
shown in Fig. |l|, where only the SUSY events have been displayed. For each event, M^s is 
defined as Q 

Mes = J2\P'T\+Pf""- (4-8) 
j 

where j runs over all jets satisfying the previous cuts. Note that the latter effectively imply 
a lower bound Meg > 680 GeV for the events considered. 



^Most of the electroweak precision tests are currently being surpassed by LHC data. Other observables, 
hke 6 — >■ s, 7 would have a moderate impact in the analysis, but we want to focus on the impact of the 
LHC, which is already the dominant part of the likelihood. For the same reason we have not included the 
somewhat controversial g — 2 data or Dark Matter constraints. 

^We have followed the strategy given in sect. 13.5 of ref. M 
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Figure 1: Effective mass distribution expected for the SUSY SU9 model without and with cuts 
applied as given in subsect. 4.1. 



Using the notation of sects. ^ the bin contents of the "experimental" histogram of Fig. || 
are the Vi quantities. For each point scanned in the parameter space we compute a simulated 
histogram (the Uj quantities in sects. ^, |3|), and then evaluate the likelihood through eq. 



( 3.17 ). In order to use that expression, we have to specify the V{f), V{g) functions, which 
encode the systematic uncertainty assigned to the total number of events and the shape 
of the histograms respectively. In our case, we have used for them the gaussian profiles 
(3.10), ( p. 11 ). The width of the first, Af, reflects the uncertainty in the total number 
of events due to (mainly) theoretical uncertainties associated with the K-factors and the 
parton distribution functions. We have been pretty conservative, assuming = 0.5; in 
other words we accept that a factor 2 or 1/2 in the total number of events is plausible^. 
On the other hand, from eq.( 3.1l| ), Ag goes as the typical systematic uncertainty in the 
shape times ^/K. For instance if, once the uncertainty affecting the global normalization is 
extracted, one estimates that there remains a systematic uncertainty in the shape which is 
of order 10% for every bin, then one has to use Ag ~ O.lV^ to reproduce that uncertainty 



at la. In our case the total number of bins is K 
reasonable choice. 



10, so we estimate 



0.2 as a 



*In the present scan we are using a tree-level Pythia simulation of the parton events; which implies 
an important uncertainty about the K-factors. In a 1-loop-refined simulation this uncertainty could be 
assumed smaller. 
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To conclude our discussion of the likelihood, let us note that the total cross section 
varies from point to point in the CMSSM parameter space. This implies that considering 
10^ initial supersymmetric events does not correspond to the same luminosity for each 
point scanned in the parameter space. But of course, the comparison of histograms must be 
realized under the same conditions of luminosity. This feature can be easily incorporated 
into the histogram-comparison technique discussed in sect. ^, leading to a slight and 
straightforward correction in the expression ( |3.17| ) for the likelihood, which now becomes 
eq. (|6.4D of Appendix A (see that appendix for more details). And that is the expression 
that we have finally used to compute the likelihood when scanning the CMSSM parameter 
space. 

We end up this subsection with a brief discussion of the third factor of the r.h.s. of 



eq.(4.4), i.e. the prior in the initial parameters. Admittedly, this is the less objective part 
of the statistical analysis, but one cannot simply ignore the prior. This would be equivalent 
to take a flat prior in the initial parameters, a choice which is as arbitrary as any other, 
unless one can give some argument of plausibility for it. Besides, in order to perform the 
marginalizations one should specify the ranges where the parameters live^. The dependence 
on the prior is actually a measure of the dependence of the results of the statistical analysis 
on a priori assumptions or prejudices. Note here that when the likelihood factor is very 
sharp, i.e. it distinctly selects a narrow region in the parameter space, the prior factor 
becomes irrelevant, since the posterior can only be sizeable where the likelihood is. But 
unfortunately we are not in that ideal situation yet, so there exists a dependence on the 
prior. Hence, the most conservative attitude is to use two different, though still reasonable, 
priors, and then compare the results. This gives a fair measure of the prior-dependence. 

For that matter we have considered two somehow standard types of prior: flat and 
logarithmic. In a flat (logarithmic) prior one assumes that, in principle, the typical size 
(order of magnitude) of the soft terms can be anything, say from 10 GeV up to Mx, with 
equal probability. In our opinion, a logarithmic prior is probably the most reasonable 
option, since it amounts to consider all the possible magnitudes of the SUSY breaking in 
the observable sector on the same foot (this occurs e.g. in conventional SUSY breaking by 
gaugino condensation in a hidden sector). However, we will consider log and flat priors at 
the same level throughout the paper, in order to compare the results and thus evaluate the 



prior-dependence. The precise forms of these priors can be found in sect. 2 of ref. 
together with a detailed discussion. 

4.2 Results 

We have computed the distribution of the posterior (|4.4D in the CMSSM parameter space 
using a modified version of the public SuperBayeS package ||l^^ adopting MultiNest v2.8 
|2C, ^] as a scanning algorithm. We use as running parameters a number of live points 



^Fortunately, in our case this point is irrelevant thanks to the Jacobian factor, J. As mentioned above, 
J automatically incorporates a fine-tuning penalization of the high-energy region of the parameters, which 
thus becomes irrelevant. For more details see 

®For this paper, the public SuperBayeS code has been modified to interface with Pythia 6.419 jl8|]. 
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niive = 2000 and a tolerance parameter tol = 1. Our final inferences for each of the 
log and flat priors are obtained from chains generated with approximately 10^ likelihood 
evaluations. 

We have also included in our likelihood the limits on the lightest Higgs and SUSY 
masses provided by LEP'^ and Tevatron . For details on the implementation see ref. 

For the marginalization procedure we have used [0, Mx] as the range for m, M1/2 and 
\A\. Besides, we have used 2 < tan/3 < 62. See a detailed discussion about this in sect. 2 
of ref. 111. 

In order to show the potential of the histogram-comparison technique, we have per- 
formed the analysis twice: switching off and on the shape test. 

Test for the total number of events 

First we compute the (LHC-part of the) likelihood associated with a particular point in 
the CMSSM parameter space by comparing the prediction for the total number of su- 
persymmetric events, satisfying the cuts specified in the previous subsection, with the 
"experimental" result for that number (i.e. after subtracting the SM background). 

This means that for each point examined we compute an histogram of events, Ui, but 
we just compare the total number, u = ^ Uj, with the experimental one, v = "^Vi, using 
the first factor of eq. ( |3.17 ) . More precisely, in order to incorporate the fact that the effective 



luminosity used in the simulation may change from point to point, we have actually used 
the -slightly modified- first factor of eq. (|6.4[) , 



LHC - likelihood oc V(f = , (4.9) 

Lu 

where L is the quotient of the experimental luminosity and the luminosity of the simulation. 
We recall that the V{f) function carries all the uncertainties affecting the total number 
of events, except the purely statistical ones (which are subdominant when that number is 
large), and is given by the gaussian ( p.lO]) with Aj = 0.5; as explained in the previous 
subsection. 

Fig. ^ (upper panels) shows the posterior pdf in the Mi/2—m plane, after marginalizing 
the rest of the parameters: A, tan f3, together with the previosly marginalized ^ and the 
nuisance SM parameters, {s}. As discussed below, the cross section for the kind of events 
considered (multijets + missing transverse momentum) is actually fairly insensitive to the 
values of A, tan /3, so the marginalization in these parameters does not change appreciably 
the probability density in the M]^/2~"^ plane. The left (right) panel corresponds to log (flat) 
priors for the soft terms. The shape of these plots can be easily understood. Since we are 
fitting a unique quantity, namely the total number of events, and we have two parameters, 
{Mi/2)?TT'}j we can expect a degeneracy in the parameter space, which is in fact the case. 
The elongated shape of the allowed region, especially visible in the flat prior case, is in 



^Recent LHC bound on the Higgs mass are still irrelevant to constrain the MSSM parameter space, 
though this situation will change soon [E2| 
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fact a widening -due to the uncertainties- of the hne where the degeneracy is exact, which 
includes of course the "true model", i.e. SU9. This is marked with a red diamond in the 
plots. Note that for small gaugino mass, the squark masses become irrelevant, provided 
they are large enough, since in that case the dominant SUSY production are gluino pairs, 
whose masses do not depend on m. This is reflected in the vertical form of the region for 
small Ml/2- 

Now, the shape of the line of degeneracy, somehow visible in the upper plots of Fig. |, 
depends on the cuts used to select events. Note that, as the values of the soft terms get 
smaller, the cuts used to bound the energy of the first and second jets, and the missing 
transverse momentum (see previous subsection) become more and more inappropriate: 
many events with three or more jets plus missing transverse momentum do not pass the 
cuts. As a consequence the counted total number of events of this kind is dramatically cut 
out and can become equal to the experimental one. This enhances artificially the statistical 
weight of the low energy region. As a result the maximum value of the pdf, and its averaged 
central value (marked by a green dot), are shifted from the "true model" (marked by a red 
diamond). 

There are ways to counteract these disagreeable effects. Playing with different cuts, the 
degeneracy gets partially broken and it is possible to discard larger regions of the parameter 
space. For instance, one can compare the total number of events using several choices for 
the lower bound on M(,s- Somehow, this equivales to test the shape of the experimental 
and theoretical histograms, but not in the most efficient way. This is improved using the 
histogram-comparison technique explained in sections 2, 3, which we will apply shortly to 
this analysis. 

Fig. |2| (lower panels) shows the posterior in the tan 13 — A plane, after marginalizing 
the rest of the parameters. As mentioned above, the cross section of the type of events 
considered does not depend appreciably on A and tan/3, and this is reflected in the plots. 
The preference for rather small values of both A and tan /3 is essentially a consequence of 



the Jacobian factor (|4.5| ) in the posterior (4^). As commented in the previous subsection 



the Jacobian automatically penalizes regions of the parameter space where fine-tuning is 
needed to reproduce the electroweak scale. This disfavors large values for both A and 
tan p.^ The remarkable insensitivity to A and tan /? is physically due to the fact that the 
CMSSM spectrum is not much dependent on the values of A and tan /3, except for mixing 
effects in the mass matrices of stops (and sbottoms and staus for large tan/3), charginos 
and neutralinos. Even for these matrices the effect is normally quite small. Thus the 
production rates of squarks and gluinos are quite independent of A and tan /3. Once the 
supersymmetric particles are created, their decay rates are not very relevant for the cross 
section of the process considered (multijets -|- missing transverse momentum), and, in any 
case, they are quite independent of these parameters too. This insensitivity to A and tan /3 
could be partially cured by complementing the present analysis by a separate study of 
those events involving leptons l23], but that discussion is outside the scope of this paper. 



*This is an statistical effect which is not visible in frequentist approaches, where the basic quantity is 
the likelihood and fine-tuning is not penalized, unless such penalization is artificially incorporated. 
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Figure 2: 2D marginalized posterior probability density function for flat (left panels) and loga- 
rithmic (right panels) priors using the normalization test. The inner and outer contours enclose 
respective 68% and 95% joint regions. The small filled circle represents the mean value of the 
posterior pdf, the cross corresponds to the best-fit point found and the diamond to the SU9 model, 
used to produce the mock experimental data. 



Incorporation of the shape test 

Now we repeat the analysis, but computing the hkelihood associated with the LHC data 
with the use of the whole expression ( ^.17 ), which takes into account not only the total 
number of events, but also the comparison of the histogram shapes. Again, in order to 
incorporate the fact that the luminosity of the simulation changes from point to point in 
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Figure 3: As Fig. 2 but using botli normalization and shape. Note the different ranges of the two 
figures. 



the parameter space we use the modified formula ( |6.4D : 
LHC - likelihood (x V{f 



Lu' 



K 



nl {Ui + Vi)\ f , I /V / V , , 

r'^iu"') i'^--^'^ ns.) 1(410) 



u 



We recall that V{g) carries all the systematic uncertainties affecting the shape of the his- 
tograms, and is given by the gaussian (3.11) with = 0.2; as explained in the previous 
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Figure 4: ID marginalized posterior probability density function of the m and M1/2 parameters 
(upper and lower panels respectively) for flat (left panels) and logarithmic (right panels) priors. 
The small filled circle represents the mean value of the posterior pdf, the cross corresponds to the 
best-fit point and the diamond to the SU9 model. 



subsection. Note that, as could be expected, the correction due to the difference in lumi- 
nosity does not affect the shape-part of the likelihood. 

Fig. ^ is as Fig. |2|, but after including the likelihood associated to the shape in the 
analysis. The upper panels show, for log and flat priors, the posterior pdf in the M1/2 ~ ^ 
plane, after marginalizing the rest of the parameters. As expected, the test of the theory is 
now much more efficient and the previous degeneracies dissapear (note the different ranges 
of the two figures). This illustrates the potential of making use of all the information 
contained in the theoretical and experimental histograms when computing the likelihood 
of a model, provided the various sources of uncertainty are properly taken into account. 
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The lower panels of Fig. ^ show the posterior in the tan 13 — A plane. Again, the cross 
section of the type of events considered does not depend appreciably on these parameters, 
which is reflected in the plots. Still, introducing the test for the shape slightly improves 
the sensitivity of the search to the values of A and tan/3, but that sensitivity is anyway 
very small. 

Figs. ^ and |5| show, for logarithmic and flat priors, the unidimensional posteriors 
for m, Ml/2; ^ ^-iid tan/3, after marginalization of all the parameters, except the one 
plotted in each graph. The shape of these functions reflects the previous discussion. It is 
worth-noticing the great precision in the determination of the gaugino mass, which comes 
from the fact that, due to the renormalization group running, M1/2 is the parameter that 
dominantly determines the low-energy spectrum of the CMSSM. 

Finally, we note that the posteriors have in all cases a very slightly dependence on the 
type of prior used, reflecting the robustness of the approach. 



5. Conclusions 

Due to the complexity of the LHC experiment, much of the comparison between LHC 
data and theoretical predictions has to be made by confronting experimental histograms 
(in different variables) and theoretical histograms produced by simulations. In many cases 
the comparison is performed by comparing the total number of events after choosing a 
clever variable and applying convenient cuts. Other techniques make use of particular 
features of the histograms, like the presence of endpoints. But the procedure can be 
optimized by evaluating the actual likelihood associated to the complete histogram. The 
main goal of this paper has been precisely to present a rigorous and effective method 
to compare experimental and theoretical histograms, evaluating the total likelihood, and 
apply it to a physically relevant case. In doing this we have taken into account that, 
besides the statistical uncertainties inherent to the histograms, there are additional sources 
of systematic error. 

In the method presented, the complete likelihood is rigorously separated into two 
factors: the likelihood of the total number of events and the likelihood of the shape of the 
histogram. This in turn allows to treat the corresponding sources of systematic uncertainty 
in a separate way as well. This is very convenient when there are reasons to expect different 
systematic errors in the two pieces. The final formula for the total likelihood is given in 
eq.(p7|). 

The procedure can be easily incorporated to both frequentist and Bayesian analyses, 
since both are based on the likelihood of the theoretical models. In the two approaches, 
incorporating the total likelihood optimizes the chances of picking up a signal of new 
physics and, once the signal is found, identifying which new physics is behind. E.g. if the 



- 21 - 



Cabrera etal. (2011) 




-2-1012 



A (TeV) 



Cabrera etal. (2011) 




-2-1 1 2 
A (TeV) 



Cabrera etal. (2011) 

T 1 



-| - n Flat priors - 




tan p 



Cabrera etal. (2011) 




tan p 



Figure 5: As Fig. 4 but for A and tan/3 parameters. 



new physics is super symmetry, it allows to find in an optimal way the parameters of the 
supersymmetric model. 

We have illustrated the latter point by showing how a search in the CMSSM parame- 
ter space, using Bayesian techniques, can effectively find the correct values of the CMSSM 
parameters by comparing histograms of events with multijets + missing transverse mo- 
mentum displayed in the effective- mass variable. The procedure is in fact very efficient 
to identify the true supersymmetric model, in the case supersymmetry is really there and 
accessible to the LHC. But, of course, the technique can be applied to any scenario of new 
physics. 
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6. Appendix A: Histogram comparison when experiment and the simula- 
tion have different luminosities 

Some expressions of sects. 2 and 3 have to be modified when the effective luminosity of 
the simulation is not the same as the experimental one. In practice, the former can change 
from point to point when scanning the parameter space since typically one simulates a 
fixed number of supersymmetric events (say 10^ events), but obviously the cross section 
changes throughout the parameter space. Of course one could adjust at every point the 
luminosity so that it coincides with the experiment, but normally this is costly in running 
time, and it is unnecessary, since the comparison can still be made as described next. 

Let us call L^^, L'^^p the luminosities of the theoretical simulation and the experiment, 
respectively, and suppose for a moment there are no systematic errors. Then the means 
that, under the null- hypothesis, are responsible for the experimental data, Vi, are not the 
ones of the simulation, say fii, but 

J^exp 

^» = Jlh k = L jli . (6.1) 

Hence eg. (^^) becomes 

P{v^\u^) = [fl dA.^^e-^'^^ ^e-A> =n^^^i±^ (l + L)-i— ^ . (6.2) 

Once systematic uncertainty is taken into account, see section^ everything is actually easy 
to handle since the luminosity factor L plays the role of a systematic and universal factor 
affecting the means of the simulation. More precisely, the equation ( |3.4D , that relates the 
true means to be compared with the experiment, //j, with those of the simulation, 
becomes 

= L f gi /if . (6.3) 

Therefore the subsequent equations remain the same with the simple change / — )• Lf . In 
particular, the likelihood given by eq.( p.l7|) becomes now 

This is the formula we have used in our scan of the CMSSM parameter space. 
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